Questions

To ensure themselves into the playoff game, Knicks need to win enough games to become top eight in east conference. So at the very beginning, we are interested in exploring the distribution of the number of games won by NBA playoff teams to find the threshold number of wins to enter into the playoff. Then We want to figure out the contributors to total wins of a season and single win of a game. With the assumption that total wins are independent between each team and each year, and the result of a single game is independent of another, we used linear regression model to fit total wins with average performance predictors in both offensive and defensive parts. In addition, we would use our model to predict Knicks’ performance in the new season of 2021-22 to see whether it can get into play-off season. Then according to the models we build, we analyzed the performance of Knicks on key predictors to see the gap between knicks and super teams. What’s more, we deep dived these gaps from team level into player level and found how leading players should improve to get more wins. Finally, a detailed game and training strategy is proposed.

  1. What is the threshold number of wins to enter playoff?

  2. What variables contribute to the total wins of a season, how they impact a single game result?

  3. If using the model we built to predict the ranking of Knicks with data in this new season, will it get into play-off?

  4. What is the difference in performance between Knicks and league average?

1).How is the Knicks’ three shooting performance different from the league average? 2).Are there any obvious weakness in three point shooting?

  1. What Knicks can do to improve its performance in other areas such as assistance, rebound, etc. to secure a place in play-off season?

Data

Data Source

As our project needs detailed stats about NBA teams and players from last 10 seasons NBA regular season, we used scrapping to get official advanced data from NBA Stats. There are four datasets we mainly used:

  1. Advanced Box Score: In this data set, each observation represent a game and the specific data in this game, which contains the score, total field goal attempt, three point made and so on.

  2. Playtype by Team: this data set contains average data for each team of a season in the aspect of offensive play type, such as isolation, pick and roll, ect. Each observation represent the team average data in a regular season with respect to a specific play type.

  3. Tracking: this data set contains detailed information about NBA teams’ average movement data in a regular season, for example, passing, touches.

  4. Knicks Shooting Log: in this data set, each observation represent a field goal that player in Knicks made, including the player who made the shot, the location they shot, the time remaining when the shot was made.

As I mentioned above, these data sets were scrapped from NBA website, the code to scrap data can be found at scrapping data

Data Wrangling(ref for scrapping)

First, the datasets in NBA Stats don’t have API, so we write a function to extract them using devtools.

scrapping_data = function(url) {
  headers = headers = c(
  `Connection` = 'keep-alive',
  `Accept` = 'application/json, text/plain, */*',
  `x-nba-stats-token` = 'true',
  `X-NewRelic-ID` = 'VQECWF5UChAHUlNTBwgBVw==',
  `User-Agent` = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36', 
  `x-nba-stats-origin` = 'stats',
  `Sec-Fetch-Site` = 'same-origin',
  `Sec-Fetch-Mode` = 'cors',
  `Referer` = 'https://stats.nba.com/players/leaguedashplayerbiostats/',
  `Accept-Encoding` = 'gzip, deflate, br',
  `Accept-Language` = 'en-US,en;q=0.9')
  response = GET(url, add_headers(headers))
  data = fromJSON(content(response, as = "text"))
  df = data.frame(data$resultSets$rowSet[[1]], stringAsFactors = FALSE)
  names(df) = tolower(data$resultSets$headers[[1]])
  return(df)
}

drop_last_column = function(df) {
  df = df %>% select(-names(df)[[length(names(df))]])
  return(df)}

Then, apply the function to each dataset we want, select the variables we want and save all of them to local for tfurther use. Here is one example, box_score_all.

season_years = c("2020-21", "2019-20", 
           "2018-19", "2017-18", 
           "2016-17", "2015-16", 
           "2014-15", "2013-14", 
           "2012-13", "2011-12", 
           "2010-11", "2009-10", 
           "2008-09", "2007-08", 
           "2006-07", "2005-06", 
           "2004-05", "2003-04", 
           "2002-03", "2001-02")

box_score_all = tibble(
  season_year = season_years,
  url = str_c("https://stats.nba.com/stats/teamgamelogs?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlusMinus=N&Rank=N&Season=", season_year, "&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&VsConference=&VsDivision="),
  box_score = map(url, scrapping_data)) %>% 
  mutate(box_score = map(box_score, drop_last_column)) %>% # last column of each box score is NA
  select(-season_year, -url) %>% 
  unnest(cols = box_score)

write_csv(box_score_all, "./data2/box_score_all.csv")

Then, read them and select related variables. All the raw datasets are as following:

box_score_all = read_csv("./data2/box_score_all.csv") %>% 
  janitor::clean_names() %>% 
  select(-contains("rank"))

pass_df = 
  read_csv("./data2/pass_df.csv") %>% 
  select(season_year, team_abbreviation, passes_made)

isol_df = 
  read_csv("./data2/isol_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_iso = poss)

prbh_df = 
  read_csv("./data2/prbh_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_prb = poss)

prrm_df = 
  read_csv("./data2/prrm_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_prr = poss)

defend_df = 
  read_csv("./data2/defensive_impact_df.csv") %>% 
  select(season_year, team_abbreviation, stl, blk, dreb)

trans_df = 
  read_csv("./data2/transition_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_trans = poss)
  • box_score_all: contains statistics in individual games. Totally 47830 observations in the last 20 years, Variables include win or lose, rebound, assist, three point shooting attempt and so on.
  • passing_df: average of passes each game.
  • defensive_impact_df: contain average defensive variables each game by team and season.
  • isolation_df: average isolation times each game by team and season.
  • transition_df: average transition times each game by team and season.
  • pick_roll_baller_df: average pick and roll for ball handler by team and season.
  • pick_roll_roller_df: average pick and roll for roll man by team and season.

After read these half raw dataset, we need to prepare four dataframes for exploring data analysis, fit regression model.

1.avg_df

avg_df = 
  box_score_all %>% 
  select(season_year, team_abbreviation, wl, pts, ast, tov, fgm, fga, fg3m, fg3a) %>%
  mutate(
    win = case_when(wl == "W" ~ 1, TRUE~0),
    game_num = 1,
    fg3a_p = round(fg3a/fga, digits = 3),
    team_abbreviation = str_replace(team_abbreviation, "NOH", "NOP"), 
    team_abbreviation = str_replace(team_abbreviation, "NJN", "BKN"),
    conference = case_when(
      team_abbreviation %in% c("UTA","PHX","LAC","DEN","DAL","LAL","POR","GSW","SAS","MEM","NOP","SAC","MIN","OKC","HOU","SEA","NOK","CHH")~"west",
      team_abbreviation %in% c("PHI","BKN","MIL","ATL","NYK","MIA","BOS","IND","WAS","CHI","TOR","CLE","ORL","DET","CHA")~"east") # divide into east and west conference
    ) %>% 
  group_by(season_year, team_abbreviation, conference) %>% 
  summarise(
    wins = sum(win), 
    games = sum(game_num), 
    games_should = 82, 
    pts_avg = round(mean(pts), digits = 1), 
    ast_avg = round(mean(ast), digits = 1),
    tov_avg = round(mean(tov), digits = 1),
    fgm_total = sum(fgm), 
    fga_total = sum(fga), 
    fg3m_total = sum(fg3m), 
    fg3a_total = sum(fg3a)
    ) %>% 
  mutate(wins_revised = round(wins/games*games_should,0)) %>% # due to labor negotiation in 2011-12, COVID-19.
  relocate(season_year, team_abbreviation, conference, wins, wins_revised, everything()) %>% 
  arrange(desc(season_year),desc(wins)) %>% 
  mutate(fg3_p = fg3a_total/fga_total, fg3_r = fg3m_total/fg3a_total) %>% 
  group_by(season_year,conference) %>% 
  mutate(
    conf_rank = row_number(),
    play_off_team = case_when(
           conf_rank <= 8 ~ "playoff", 
           conf_rank > 8 ~ "non-playoff"
         ), 
         play_off_team = fct_relevel(play_off_team, c("playoff", "non-playoff")))

The avg_df contains the number of wins by team and year in the last 20 years, which is used to analyse the distribution of threshold wins, and to further generate predict_df for regression. As data in box_score_all is documented by game, we need to summarise it into total and average stats. Detailed wrangling process:

1).select the variables we are interested from box_score_all.

2).define four new variables. * win: 1 means winning and 0 means lost. * game_num: constant 1 used to sum the number of games. * fg3a_p: the percentage of 3 point field goal attempt. * conference: west or east

3). * calculate total number of wins, total number of games, total number of field goal attempt, total number of field goal made, total number of 3 points field goal attempt and total number of 3 points field goal made by team and season. * calculate average number of points, average number of assists and average number of turnovers by team and season. * revised the number of wins in strike season and COVID-19 season. * locate key variables at the first. * arrange all data by season year and number of wins. * calculate the rank in east or west conference by team and year * marked whether or not a team entered into the playoff in that year

2.predict_df

predict_df = 
  avg_df %>%
  left_join(defend_df, by = c("season_year","team_abbreviation")) %>% 
  left_join(prrm_df, by = c("season_year","team_abbreviation")) %>% 
  left_join(prbh_df, by = c("season_year","team_abbreviation")) %>%
  left_join(isol_df, by = c("season_year","team_abbreviation")) %>% 
  left_join(pass_df, by = c("season_year","team_abbreviation")) %>%
  left_join(trans_df, by = c("season_year","team_abbreviation")) %>% 
  drop_na(poss_trans, passes_made, poss_iso, poss_prb, poss_prr, stl, blk, dreb) %>% 
  mutate(
    poss_pr = poss_prr + poss_prb
  ) %>% 
  select(-poss_prr, -poss_prb, -wins, -games, -games_should, -fgm_total, -fga_total)

The predict_df contain the average performance data with total number of games won in the last 8 years, which is used to build models and predict the number of winnings. Not only do we include the fundamental average stats like points, steals, blocks and turnovers from avg_df, but also we want to include playtype data and denfensive data in the model. Thus, we combine the avg_df with 6 other dataframes.

**3.box_score_viz*

box_score_viz = 
  box_score_all %>% 
  filter(season_year %in% c("2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21")) %>% 
  mutate(team_abbreviation = str_replace(team_abbreviation, "NOH", "NOP"), 
    team_abbreviation = str_replace(team_abbreviation, "NJN", "BKN")) %>% 
  select(season_year, team_abbreviation, wl, pts, ast, tov, fgm, fga, fg3m, fg3a, stl, blk, dreb) %>%
  mutate(
    win = case_when(wl == "W" ~ 1, TRUE~0),
    game_num = 1, 
    conference = case_when(
      team_abbreviation %in% c("UTA","PHX","LAC","DEN","DAL","LAL","POR","GSW","SAS","MEM","NOP","SAC","MIN","OKC","HOU","SEA","NOK","CHH")~"west",
      team_abbreviation %in% c("PHI","BKN","MIL","ATL","NYK","MIA","BOS","IND","WAS","CHI","TOR","CLE","ORL","DET","CHA")~"east"), # divide into east and west conference
    fg3a_p = round(fg3a/fga, digits = 3),
    fg3_r = round(fg3m/fg3a, digits = 3)
    ) %>% 
  relocate(season_year, team_abbreviation, conference)

conf_rank = 
  avg_df %>% 
  filter(season_year %in% c("2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21")) %>% 
  ungroup() %>% 
  select(season_year, team_abbreviation, conference, conf_rank)


#join the two table together
box_score_viz = 
  box_score_viz %>% 
  left_join(conf_rank, by = c("season_year", "team_abbreviation", "conference")) %>% 
  mutate(play_off_team = case_when(
           conf_rank <= 8 ~ "playoff", 
           conf_rank > 8 ~ "non-playoff"
         ), 
         play_off_team = fct_relevel(play_off_team, c("playoff", "non-playoff")), 
         fg3p = fg3m / fg3a) %>% 
  relocate(season_year, team_abbreviation, conference, play_off_team)

This dataframe contains 23476 observations in the last 8 years which is mainly used for analyzing the tendency of different stats between palyoff teams and non-playoff teams.

To add the rank of a team, we joined it with the conf_rank in avg_df.

4,regre_df

regre_df = 
  box_score_all %>%
  select(-c(1:7)) %>%
  select(-ends_with("rank")) %>%
  mutate(wl = recode(wl, "W" = 1, "L" = 0),
         wl = as.factor(wl)) 

This dataframe contains stats per game, which is used for logistic regression. We exclude useless variables here and change win or lose to a factor variable.

Data Description

We are going to do two regression model with the data above. One is to fit the number of wins of a season with average performance. The other is to fit win or loss of a single game with the data in a single game.

1.Predict the number of wins

Dependent variable is the number of wins by team and season, denoted by wins_revised.

Independent variables are selected from both offensive aspect and defensive part.

The typical attributes of “small ball era” is more three points shooting and quicker speed. So we select the following variables representing offensive level of a team: * fg3_p: proportion of three points shooting * fg3_r: three points shooting rate * pts_avg: average points per game * tov_avg: average number of turnovers per game * ast_avg: average number of assists per game * poss_trans: average number of transitions * passes_made: average number of passes per game * poss_iso: average number of isolations per game * poss_pr: average number of pick and rolls

As for the defensive level, variables include:

  • stl: average steals per game
  • blk: average blocks per game
  • dreb: average defensive rebounds per game

2.Predict the win or lose of a game

These are some reasonable variables that should be added into the regression model:

  • wl: Win/Loss
  • min: minutes
  • pts: Points
  • fgm: Field Goals Made
  • fga: Field Goals Attempted
  • fg_pct: Field Goal Percentage
  • fg3m: 3 Point Field Goals Made
  • fg3a: 3 Point Field Goals Attempted
  • fg3_pct: 3 Point Field Goals Percentage
  • ftm: Free Throws Made
  • fta: Free Throws Attempted
  • ft_pct: Free Throw Percentage
  • oreb: Offensive Rebounds
  • dreb: Defensive Rebounds
  • reb: Rebounds
  • ast: Assists
  • stl: STL
  • blk: Blocks
  • blka: Blocks Attempted
  • tov: Turnovers
  • pf: Personal Fouls
  • pfd: Personal Fouls Drawn
  • plus_minus: Plus-Minus

Exploratory Analysis

In this part, we explore that on which variables, there would be difference between teams that get into play-off season and teams that not. In this way, we can get some insight on choosing potential parameters for model building. Specifically, we identify the trend in three point attempt by time in the past 10 seasons

Difference average scores

Firstly, We wanted to look at how the scores of each play distributed in the last 10 seasons from the aspects of team which got into play-off season and team who didn’t.From the figure below.

non_play_off = 
  box_score_viz %>% 
  filter(play_off_team == "non-playoff") 

box_score_viz %>% 
  filter(play_off_team == "playoff") %>% 
  ggplot(aes(x = pts, y = season_year)) + 
  geom_density_ridges(scale = .8, alpha = .5, fill = "blue", 
                      quantile_lines = T, quantile_fun = mean) + 
  geom_density_ridges(data = non_play_off, aes(x = pts, y = season_year), 
                      scale = .8, alpha = .5, fill = "salmon", 
                      quantile_lines = T, quantile_fun = mean) + 
  scale_fill_manual(name = "Team", values = cols) + 
  xlim(65, 140) + 
  labs(x = "Scores", 
       y = "Season Year", 
       title = "Score Distribution Between Playoff and Regular Season Team")  

Two things are obvious.

Firstly, in the last 10 regular seasons, score of each play displays an increasing trend. Secondly, team who got into the playoff season have higher average scores compared to team who did not get into playoff season.

Its easy to understand the second tendency that playoff teams outscore the non-plyoff ones because higher scores let them win more. As for the rising of average score for all NBA teams, that is due to the small ball revolution, in which teams are going to speed up, get more shooting chances and increase the percentage of three points shooting.

Three Point Parameters

Next, we use the Boxscore data and team average data to deep dive potential variables that contribute to the wining of plays.

It is clear that the percentage of three point field goal attempt in all field goal attempt were increasing in the last 10 seasons, which corresponds to the phenomenon of “Small Ball Revolution” and our analysis that score of each play was increasing in last 10 regular seasons. On the other hand, team who got into playoff season have more three point shooting attempt during a game, which means the three point shooting attempt percent might be a contributor to the number of game wins.

It is also apparent that the three point shooting rate is higher among playoff teams than non-playoff teams. That is because high shooting rate corresponds to higher scores of a game. Another tendency from this plot is that the variation of three point shooting rate narrows down. That reflects the attention that teams paid to three point shooting. If players were trained more on shooting, their shooting would become more stable.

Three Field Goal Attempt Percent

plot_ly(box_score_viz, x = ~ season_year, y = ~ fg3a_p, color = ~ play_off_team, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Three Field Goal Attempt Percent"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Three Point shooting Rate

plot_ly(box_score_viz, x = ~ season_year, y = ~ fg3p, color = ~ play_off_team, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Three Pointer Rate"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Offensive Play Type

Then, we are going to explore the influence that playtypes have on the average wins. If a playtype can apparently contribute to the number of wins of a team, we would suggest Knicks to design more offense in that type.

The average isolations per game in playoff teams are almost rqual from 2013-14 seasaon to now, while the average isolations per game for non-playoff teams tended to decrease overtime. Super star group might account for this phenomenon, because super stars are able to conduct more isolation. As super stars joined the playoff team, the isolation of non-playoff teams decreased.

Pick and roll is a common offensive team work. We can see from Pick and Roll plot that the average pick and rolls per game tended to increase in the last 8 years, and that of playoff teams was lower than that of non-playoff teams, which matched the phenomenon of isolation a lot.

Transition means the defensive team immediately launches a fast break after getting the rebound or stealing the ball without waiting for the new defensive team to be seated. It is an important way to speed up and score easily and quickly. Average transitions rose gradually because it is more efficient. And non-playoff teams seemed to conduct more transitions than playoff teams. But that didn’t mean more transitions less wins, instead it was likely that due to the team was non-playoff team, it has lower power in seated offense thus they tended to do more transitions.

Isolation

play_tp_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(iso_mean = mean(poss_iso)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Isolation: ", round(iso_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ iso_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Isolation"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Pick and Roll

play_tp_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(pr_mean = mean(poss_pr)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Pick and Roll: ", round(pr_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ pr_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Pick and Roll"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Transition

play_tp_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(trans_mean = mean(poss_trans)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Transition: ", round(trans_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ trans_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Transition"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Season Average Movement Parameters

Block is a key defense parameter. Higher blocks mean that your opponents have lower chance to score on you. Playoff teams played better on blocks than non-playoff teams.

Steal is also a defense parameter, which is accompanied by turnovers of opponents. There was no apparent tendency in steal over time.

Too many turnovers would let a team lose a game. The turnover plot shows that the average turnovers in playoff teams were lower than the average turnovers in non-playoff teams.

Defensive rebounds could prevent the opponent’s second attack so that reduce its scores. As we can see, palyoff teams could grab more defensive rebounds than non-playoff teams.

The number of passing per game reflect the offense fluency. Adequate number of passes could bring create good shooting opportunities, but no good shooting opportunity created after too many passes represents bad offense ability. From the passes plot, non-playoff teams had higher average passes per game than playoff teams.

Block

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(blk_mean = mean(blk)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Block: ", round(blk_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ blk_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Steal"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Steal

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(stl_mean = mean(stl)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Steal: ", round(stl_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ stl_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Steal"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Turnover

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(tov_mean = mean(tov_avg)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Turnover: ", round(tov_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ tov_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Turnover"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Defensive Rebound

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(dreb_mean = mean(dreb)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Defensive Rebound: ", round(dreb_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ dreb_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Turnover"))
<<<<<<< HEAD
=======
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Passes

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(passes_mean = mean(passes_made)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Passes: ", round(passes_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ passes_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Passes"))
<<<<<<< HEAD
=======

Variabls can affect the result of game

Field Goal Percentage

plot_ly( box_score_all, x = ~ season_year, y = ~ fg_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Field Goal Percent"))

3 Point Field Goals Percentage

plot_ly(box_score_all, x = ~ season_year, y = ~ fg3_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "3 Point Field Goals Percentage"))

Free Throw Percentage

plot_ly(box_score_all, x = ~ season_year, y = ~ ft_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Free Throw Percent"))

Offensive Level Parameters

Average offensive rebounds per game

offensive_df %>% 
  group_by(season_year, wl) %>% 
  summarise(oreb= mean(oreb)) %>% 
  plot_ly(x = ~ season_year, y = ~ oreb, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average offensive rebounds of each game"))

Aaverage assists per games

offensive_df %>% 
  group_by(season_year, wl) %>% 
  summarise(ast= mean(ast)) %>% 
  plot_ly(x = ~ season_year, y = ~ ast, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage assists of each game"))

Defensive level Parameters

Steals of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(stl= mean(stl)) %>% 
  plot_ly(x = ~ season_year, y = ~ stl, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage steals of each game"))

Blocks of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(blk= mean(blk)) %>% 
  plot_ly(x = ~ season_year, y = ~ blk, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage blocks of each game"))

Defensive rebounds of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(dreb= mean(dreb)) %>% 
  plot_ly(x = ~ season_year, y = ~ dreb, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage defensive rebounds of each game"))

Turnovers of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(tov= mean(tov)) %>% 
  plot_ly(x = ~ season_year, y = ~ tov, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage turnovers of each game"))

Personal foul of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(pf= mean(pf)) %>% 
  plot_ly(x = ~ season_year, y = ~ pf, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage personal foul of each game"))
>>>>>>> f99d753d53e4e3253a4b894b8443dd217354b608

Shiny App

Conclusion and Discussion